Dataset Overview

For the project analysis, we have used two different primary datasets sourced from Kaggle for 11 months from Jan-2021 to Nov-2021, where the first dataset contains data of world countries and each country’s daily vaccinations for over 220 countries, and the other dataset describes the 8 different vaccine manufactures across the world. Additionally, we used additional data sets to retrieve ISO codes to prepare analysis and graphical interpretation.

The variables in the first data set consist of location, date, total vaccinations, fully vaccinated people, daily vaccinations, and daily vaccination per hundred and per million.

The Second dataset has country data, types of vaccination manufacturers, and total vaccination for each vaccine manufacture supply chain.

Data preparation

After data gathering, we filtered the raw data, removing redundant columns, keeping the required variables date, country, total vaccination dates, daily vaccination, daily vaccination per million. To retrieve the complete vaccination of each country, we performed a “group by function” on the data set based on the country data and retried the full vaccination of each country using the dplyr libraries of R for the first dataset. For the second data set, we performed group by function on vaccine manufacture with locations to find the total vaccination of each vaccine manufacture in all the calculated countries. Then for the third data set, we performed the “inner join function” between the first 2 data set and the third dataset to retrieve the ISO codes of the respective countries.

Goal of Analysis

The analysis aims to analyze daily vaccination trends, total vaccination across the world, and the different distributions between the high GDP counties and low GDP countries.

General analysis of country vaccinations:

Top 5 vaccinated countries

Using the data retrieved from the group by function on the data set, we can find out the top 5 countries with the highest number of vaccinations overall.

The above graph “Top 5 countries in vaccination” shows the progress of countries “Brazil”, “China”, “India”, “Indonesia”, “United States”. where we can observe that China has more vaccination rates compares to other with lead of 2.4 Billion where as followed by India and comes United states and with almost similar rates for Brazil and Indonesia.

Out of these countries only one is considered as developed countries and the rest are developing countries which is surprising,

Daily Vaccination rate of top 5 countries

We used the daily vaccinations data to plot the graph below showing the daily observations of minimum and maximum vaccinations and the median in millions. This box plot shows the distribution of daily vaccination of the top 5 countries, considering the daily vaccine, and we calculated the median for all the data for a daily vaccine. The third quartile of china’s box plot is larger compared to the others informing us that the daily vaccination progress in china is discrete. The improvement of daily vaccination in India is constant compared to the other four countries.

Box plots show overall patterns of response of the top 5 countries during the second covid wave. They provide a helpful way to visualize the range and other characteristics of reactions for a large group which can be interfered with from the geo map presented below.

Manufacturers

Disturbution of different vaccines across the world

We used Plotly to compare the following data: the pie chart, in the pie chat we can observe that the significant portion of the vaccination is manufactured by Pfizer/BioNTech with 68.7%% and followed by Moderna with 17%, where the last pie is covered by Oxford/AstraZeneca with 8.52 % and next to coves the almost identical portions by Johnson&Johnson and Sinovac with almost similar 2.71 % and 2.666 % followed by the least three manufactures, Sinopharm/Beijing, Sputnik, and Cansino with 0.192 % and 0.0515% in the vaccination trends for the 11 months recorded data. We can infer 85% of the total vaccinations are done by Pfizer and Moderna, and the rest of the portion is covered by the other seven vaccine providers.

Distribution of vaccines according to manufacturers amongst four different countries

How the rich has access to more expensive vaccine like Pfizer and Moderna and the developing only has access to only sinovac, china vaccine. The developing countries “Chile” and “Ecuador” and the developed countries “France” and “United States” have two different major vaccine providers. The developing countries used Sinovac, whereas the developed countries used Pzifer to vaccinate most of their population. Moderna is the seccond leading vaccine provder in the developed countries where Oxford/Astrazenica is the commmonly avaliale in all the countries.

The analysis of daily vaccination rates

plot_ly(covi_1, x = ~daily_vaccinations_per_million, type = "histogram", prob = TRUE, color= "", histnorm="probability", ylim = c(0,0.1))%>%
  layout(title = 'Distribution of daily vaccination rate per million across the world', xaxis = list(title = 'Daily Vaccinations'), font=t,
         yaxis = list(title = 'Density'))

The above graph is daily distributions of vaccinations this is the distribution of daily vaccination rate per million of all the countries that are present in the dataset, as we can see the the data distribution is positively skewed to the right.

## [1] "The mean of the daily vaccination is  3457.68035031955"
## [1] "The standard deviation of the daily vaccination is  4161.99330866102"

Central Limit Theorem

## Sample Size =  10  Mean =  3468.753  SD =  1325.401 
## Sample Size =  20  Mean =  3445.608  SD =  915.7248 
## Sample Size =  30  Mean =  3449.94  SD =  763.1634 
## Sample Size =  40  Mean =  3460.152  SD =  667.3912

Finding

Due to the central limit theorem, the mean of different types of samples does not change from the standard of the original distribution.

As we can see, the daily vaccination rate across the globe is skewed; we can apply the central limit theorem, which is usually distributed, which we presented with 4 sample graphs.

As displayed in the histogram above, the distribution is normal. Below are histograms showing the sample means of 10000 random samples of sample sizes 10, 20, 30, and 40 following a normal distribution.

Sampling

Sampling is used to identify and analyze trends or patterns that can be seen in a subset of a larger data group. It can also be a helpful technique to help predict some type of data or information. There are many different types of sampling that can be applied to data. The sampling methods used for this analysis are simple random sampling without replacement, systematic, and stratified.

Sampling is used to identify and analyze trends or patterns that can be seen in a subset of a larger data group. It can likewise be a valuable method to assist with anticipating some sort of information or data. There are many different types of sampling that can be applied to data. The sampling methods used for this analysis are simple random sampling without replacement, systematic, and stratified. The selection was specifically looking at the vaccination progress. Simple random sampling is when a specified sample is selected from the larger group or larger frame. Each vaccine has an equal opportunity of getting selected. A sample size of 1000 is being used. Out of the population, there will be 1000 randomly chosen without replacement per every million. Another technique that was used to sample the data was stratified sampling. Stratified sampling is when the larger data group is broken into smaller groups, and then specific sizes are picked from each group.

Finding

For analyzing with the data we collected to compare with the generated information, as can be interpreted from each of the graphs upon comparison, the sampling is similar to the vaccination rate in the globe.

Findings

##  Mean of Population =  3457.68  SD =  4161.993
##  Mean of simple random sampling without replacement =  3374.289  SD =  3784.423
##  Mean of systematic sampling =  3457.68  SD =  4161.993
##  Mean of stratified sampling =  3477.305  SD =  4096.265

As we can see the mean between the three types of sampling is nearly equal to each other.

Visualization

Visualization of total vaccine progress till date

The geographical map shows that the vaccination process is directly proportional to the countries’ total population. The peculiar observation of developed countries like the United States still showed lower vaccination rates than developing countries.

Vizualization of daily vaccine progress